Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: correctly escape quotes in field names #596

Merged
merged 2 commits into from
Sep 30, 2024
Merged

Conversation

Taepper
Copy link
Collaborator

@Taepper Taepper commented Sep 25, 2024

resolves #595

Summary

We did not correctly escape quotes in all cases. The detected error is added as a test and the corresponding bug is now fixed.

PR Checklist

- [ ] All necessary documentation has been adapted or there is an issue to do so.

  • The implemented feature is covered by an appropriate test.

Copy link
Contributor

github-actions bot commented Sep 25, 2024

This is a preview of the changelog of the next release. If this branch is not up-to-date with the current main branch, the changelog may not be accurate. Rebase your branch on the main branch to get the most accurate changelog.

Note that this might contain changes that are on main, but not yet released.

Changelog:

0.2.22 (2024-09-30)

Bug Fixes

  • correctly escape quotes in field names (9d37222)

Copy link
Contributor

@fengelniederhammer fengelniederhammer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting an error:

[2024-09-26 13:57:41.395] [logger] [debug] [preprocessing_database.cpp:73] Preprocessing Database - Result:
Parser Error: syntax error at or near ""lineage IS NULL THEN 'NULL'::VARCHAR ELSE '_' || pango_""
LINE 4: FROM (SELECT CASE WHEN pango_"lineage IS NULL THEN 'NULL'::VARCHAR ELSE '_' || pango_"lineage END AS partition_key, COUNT(*) AS count
      FROM metadata_table
      GROUP BY partition_key
      ORDER BY partition_key);
...
                                     ^

[2024-09-26 13:57:41.400] [logger] [error] [api.cpp:282] Parser Error: syntax error at or near ""lineage IS NULL THEN 'NULL'::VARCHAR ELSE '_' || pango_""
LINE 4: FROM (SELECT CASE WHEN pango_"lineage IS NULL THEN 'NULL'::VARCHAR ELSE '_' || pango_"lineage END AS partition_key, COUNT(*) AS count
      FROM metadata_table
      GROUP BY partition_key
      ORDER BY partition_key);
...
                                     ^

With testBaseData/exampleDataset/database_config.yaml changed to

schema:
  instanceName: sars_cov-2_minimal_test_config
  metadata:
    - name: gisaid_epi_isl
      type: string
    - name: date
      type: date
    - name: unsorted_date
      type: date
    - name: region
      type: string
      generateIndex: true
    - name: country
      type: string
      generateIndex: true
    - name: 'pango_"lineage'
      type: pango_lineage
    - name: division
      type: string
      generateIndex: true
    - name: age
      type: int
    - name: qc_value
      type: float
    - name: test_boolean_column
      type: boolean
  primaryKey: gisaid_epi_isl
  dateToSortBy: date
  partitionBy: 'pango_"lineage'
defaultNucleotideSequence: "main"

and the respective small_metadata_set.tsv header
gisaid_epi_isl pango_"lineage date region country division unsorted_date age qc_value test_boolean_column

src/silo/preprocessing/metadata_info.test.cpp Outdated Show resolved Hide resolved
src/silo/preprocessing/metadata_info.test.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@pflanze pflanze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything wrong with your changes--assuming that explicit escaping is the proper pragmatic decision to be taken right now. The safer solution is to never create SQL strings but use prepared statements and let the database handle correct value inclusion. Apparently DuckDB supports them[1]. This will be a somewhat larger change, depending on whether you want to keep the prepared statements around (may or may not be relevant for performance). Considering our tight time budget right now I guess it's fair to postpone that change.

[1] https://duckdb.org/docs/sql/query_syntax/prepared_statements.html

I'm approving in case you want to defer the prepared statements, also I haven't understood the test name arguments and am not implying to override those.

@Taepper Taepper merged commit 5d908b2 into main Sep 30, 2024
10 checks passed
@Taepper Taepper deleted the 595-metadata-quotes branch September 30, 2024 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Quotes in field names cause preprocessing errors
3 participants